Artificial Visual Speech, Synchronized with a Speech Synthesis System
نویسندگان
چکیده
This paper describes a new approach of modeling visual speech, based on an artificial neural network (ANN). The network architecture makes possible a fusion of linguistic expert knowledge into the ANN. Goal is the development of a computer animation program as a training aid for learning lip-reading. The current PC version allows a synchronization of the animation program with a special stand-alone speech synthesis computer via a Centronics parallel interface.
منابع مشابه
FSM and k-nearest-neighbor for corpus based video-realistic audio-visual synthesis
In this paper we introduce a corpus based 2D videorealistic audio-visual synthesis system. The system combines a concatenative Text-to-Speech (TTS) System with a concatenative Text-to-Visual (TTV) System to an audio lipmovement synchronized Text-to-Audio-Visual-Speech System (TTAVS). For the concatenative TTS we are using a Finite State Machine approach to select non-uniform variablesize audio ...
متن کاملHMM-based text-to-audio-visual speech synthesis
This paper describes a technique for text-to-audio-visual speech synthesis based on hidden Markov models (HMMs), in which lip image sequences are modeled based on imageor pixel-based approach. To reduce the dimensionality of visual speech feature space, we obtain a set of orthogonal vectors (eigenlips) by principal components analysis (PCA), and use a subset of the PCA coefficients and their dy...
متن کاملA Framework for Data-driven Video-realistic Audio-visual Speech-synthesis
In this work, we present a framework for generating a video-realistic audio-visual “Talking Head”, which can be integrated in applications as a natural Human-Computer interface where audio only is not an appropriate output channel especially in noisy environments. Our work is based on a 2D-video-frame concatenative visual synthesis and a unit-selection based Text -to-Speech system. In order to ...
متن کاملVisual synthesis of source acoustic speech through kohonen neural networks
The objective of bimodal (audio-video) synthesis of acoustic speech has been addressed through the use of Kohonen neural architectures encharged of associating acoustic input parameters (cepstrum coe cients) to articulatory estimates. This association is done in real-time allowing the synchronized presentation of source acoustic speech together with coherent articulatory visualization. Di erent...
متن کاملText-to-audio-visual speech synthesis based on parameter generation from HMM
This paper describes a technique for synthesizing auditory speech and lip motion from an arbitrary given text. The technique is an extension of the visual speech synthesis technique based on an algorithm for parameter generation from HMM with dynamic features. Audio and visual features of each speech unit are modeled by a single HMM. Since both audio and visual parameters are generated simultan...
متن کامل